Automated document content characterization for a multimedia document retrieval system
نویسندگان
چکیده
We propose a new approach to automate document image layout extraction for an object-oriented database feature population using rapid low level feature analysis, preclassification and predictive coding. The layout information comprised of region location and classification data is transformed into ‘feature object(s)’. The information is then fed into an intelligent document image retrieval system (IDIR) to be utilized in document retrieval schemes. The IDIR system consists of user interface, objectoriented database and a variety of document image analysis algorithms. In this paper the object-oriented storage model and the database system are presented in formal and functional domains. Moreover, the graphical user interface and a visual document image browser are described. The document analysis techniques used at document characterization are also presented. In this context the documents consist of text, picture and other media (possibly embedded) data. Documents are stored in the database as document, page and region objects. Our test system has been implemented and tested using a document database of 10 000 documents.
منابع مشابه
Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback
Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملSpoken Document Retrieval and Summarization
Huge, continually increasing quantities of multimedia content including speech information are filling up our computers, networks and lives. It is obvious that speech is one of the most important sources of information for multimedia content, as it is the speech of the content that tells us of the subjects, topics and concepts. As a result, the associated spoken documents of the multimedia cont...
متن کاملINEX 2005 Multimedia Track
This paper reports on the activities of the INEX 2005 Multimedia track. The track was successful in realizing its objective to provide a pilot evaluation platform for the evaluation of retrieval strategies for XML-based multimedia documents. In this first exploratory year the focus of the evaluation experiment was to test approaches for the retrieval of XML fragments using a combination of cont...
متن کامل